The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge

نویسندگان

  • He-Da Wang
  • Teng Zhang
  • Ji Wu
چکیده

This article describes the final solution 1 of team monkeytyping, who finished in second place in the YouTube-8M video understanding challenge. The dataset used in this challenge is a large-scale benchmark for multi-label video classification. We extend the work in [1] and propose several improvements for frame sequence modeling. We propose a network structure called Chaining that can better capture the interactions between labels. Also, we report our approaches in dealing with multi-scale information and attention pooling. In addition, We find that using the output of model ensemble as a side target in training can boost single model performance. We report our experiments in bagging, boosting, cascade, and stacking, and propose a stacking algorithm called attention weighted stacking. Our final submission is an ensemble that consists of 74 sub models, all of which are listed in the appendix.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The YouTube-8M Kaggle Competition: Challenges and Methods

We took part in the YouTube-8M Video Understanding Challenge hosted on Kaggle, and achieved the 10th place within less than one month’s time. In this paper, we present an extensive analysis and solution to the underlying machine-learning problem based on frame-level data, where major challenges are identified and corresponding preliminary methods are proposed. It’s noteworthy that, with merely ...

متن کامل

An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform

Large-scale datasets have played a significant role in progress of neural network and deep learning areas. YouTube-8M is such a benchmark dataset for general multilabel video classification. It was created from over 7 million YouTube videos (450,000 hours of video) and includes video labels from a vocabulary of 4716 classes (3.4 labels/video on average). It also comes with pre-extracted audio &...

متن کامل

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

This paper describes our solution for the video recognition task of the Google Cloud & YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label video recognition. Our system c...

متن کامل

UTS submission to Google YouTube-8M Challenge 2017

In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017. We leveraged both video-level and frame-level features in the submission. For video-level classification, we simply used a 200-mixture Mixture of Experts (MoE) layer, which achieves GAP 0.802 on the validation set with a single model. For frame-level classification, we utilized several variants of r...

متن کامل

Cultivating DNN Diversity for Large Scale Video Labelling

We investigate factors controlling DNN diversity in the context of the “Google Cloud and YouTube-8M Video Understanding Challenge”. While it is well-known that ensemble methods improve prediction performance, and that combining accurate but diverse predictors helps, there is little knowledge on how to best promote & measure DNN diversity. We show that diversity can be cultivated by some unexpec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.05150  شماره 

صفحات  -

تاریخ انتشار 2017